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1.0  INTRODUCTION 


KeepSake  is  a  multiuser  database  kernel  which  extends  a  programming  language  to 
enable  simple  data  structures  to  be  written  to  disc.  KeepSake  write  procedures  take 
a  block  of  data,  write  it  to  disc  and  return  a  pointer  to  that  data.  The  read  procedures 
take  a  KeepSake  pointer  and  assign  the  data  accessed  to  a  user  defmed  array.  These 
procedures  are  used  to  construct  a  network  of  data  and  pointers.  KeepSake  comes 
with  a  sophisticated  suite  of  data  block  management  procedures,  for  example,  disc 
garbage  collection. 

It  is  not  possible  for  the  user  to  call  KeepSake  directly  on  compound  data  structures; 
these  must  be  broken  into  simple  data  structures  before  the  KeepSake  routines  can 
be  used.  This  memorandum  describes  how  to  extend  KeepSake  to  allow  a  user  to 
produce  a  persistent  heap,  i.e.,  to  write  a  complete  data  type  to  disc,  including  any 
references,  without  having  to  unpack  the  data  himself. 

Although  a  KeepSake  discpointer  is  a  simple  structure  it  is  recognised  as  a  "special" 
data  type  and  is  treated  in  a  specific  way  because  of  the  need  to  preserve  the 
contents  of  the  structure.  It  is  therefore  possible  for  the  user  to  persist  a  data  type 
which  contains  KeepSake  discpointers  to  data  which  has  already  been  persisted. 

The  software  produced  is  in  the  form  of  two  procedures,  Persist  and  Unpersist.  It  was 
developed  using  Algol68  as  the  user’s  programming  language  on  a  VAXA^MS 
system;  however  the  method  used  could  be  easily  applied  to  other  languages  and 
machines. 

The  Persist  routine  requires  information  on  the  type  of  the  object  to  be  persisted  and 
the  object  itself.  Persist  takes  as  its  parameters  a  KeepSake  discfile,  a  mainstore 
reference  to  the  data  structure  which  is  to  be  persisted  and  a  vector  of  characters 
which  describes  the  mode  of  the  data  structure.  Persist  delivers  a  KeepSake 
discpointer  as  its  result.  Qearly  it  is  inefficient  to  describe  the  data  type  to  be 
persisted  by  a  vector  of  characters  as  this  requires  the  user  to  provide  a  block  of  text 
twice,  once  for  the  compiler  and  once  for  the  Persist  procedure,  and  discrepancies 
between  the  two  are  not  detected.  However,  the  Persist  procedure  needs  to  know  the 
exact  mode  of  the  data  type  which  is  to  be  persisted,  and  has  to  accept  the  mode  of 
any  data  structure  no  matter  how  complex,  and  this  cannot  be  described  without  the 
use  of  an  inAnite  union  which  is  not  implemented  in  languages  such  as  Algol68  and 
C.  An  alternative  approach  would  be  to  alter  the  compilers  to  build  this  in  as  is  done 
already  for  read  and  print  procedures.  One  language  which  has  built  in  persistence  is 
PS'Algol[2];  persistence  in  this  language  is  totally  transparent,  there  being  only  one 
pointer,  the  PNTR,  for  mainstore  and  fUestore  alike.  In  contrast  to  PS-Algol,  Persist 
was  designed  to  use  specific  pointers  for  filestore  data  so  the  user  is  always  aware  if 
he  is  addressing  filestOTe  or  mainstore. 

Unpersist  has  two  parameters,  the  KeepSake  discfile  where  the  data  was  written  and 
the  KeepSake  discpointer  produced  by  Persist.  It  delivers  as  its  result  a  mainstore 
reference  to  the  data  which  had  been  persisted. 
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2.0  HOW  TO  USE  PERSIST  AND  UNPERSIST 


The  pnxjedure  Persist  takes  as  its  first  parameter  a  KeepSake  DISCFILE. 
Directions  on  producing  and  initialising  KeepSake  discfiles  are  given  in  [1]. 

The  second  parameter  of  Persist  is  a  VECTORQCHAR  which  gives  all  the 
declarations  necessary  to  describe  the  mode  of  the  object  which  is  to  be  persisted. 
Each  mode  in  the  vector  of  characters  must  be  described  in  terms  of  basic  modes 
(INT,  REAL  etc)  or  of  a  mode  which  has  preceded  it  in  the  character  string.  The 
syntax  used  to  describe  the  modes  is  exactly  that  used  in  Algol68,  which  means  that 
the  user  can  use  a  text  editor  to  extract  the  mode  declarations  from  his  program  for 
use  in  the  string.  TTie  final  mode  in  the  character  string  must  be  the  mode  of  the 
object  which  is  to  be  persisted.  The  following  example  may  help;- 


VECTORLiCHAR  id  =  "MODE  Ml  =  STRUCT(INT  iJWEAL  r)," 

"M2  =  STRUCT(BOOL  b,  CHAR  c)," 
"M3  =  STRUCT(M1  m,  M2  n);"; 


The  third  parameter  of  Persist  is  an  INTEGER  which  is  the  start  address  of  the  data 
to  be  persisted.  This  can  be  produced  by  using  BIOP  99  to  change  the  mode  of  a  REF 
to  an  INTEGER  as  in  the  following  example.  Note  especially  that  in  the  case  of 
vectors  and  arrays  a  single  pointer  to  the  data  is  delivered  by  a  REF  REF  MODE 
and  this  is  the  mode  which  must  be  changed  to  an  INTEGER  by  BIOP  99. 

The  result  of  a  successful  call  of  Persist  is  a  KeepSake  DISCPTR.  The  user  can 
write  this  DISCPTR  away  using  the  normal  KeepSake  routines  or  he  can  incorporate 
it  into  an  Algol  structure  which  can  be  the  parameter  of  a  subsequent  call  of  Persist. 

The  following  lines  of  code  are  all  that  need  to  be  added  to  a  user’s  program  to 
enable  him  to  persist  a  data  structure  of  MODE  EXAMPLE.  The  Persist  procedure 
can  only  be  used  within  a  KeepSake  environment  and  the  KeepSake  DISCFILE  it 
uses  must  have  been  opened  and  initialised  using  standard  KeepSake  routines.  The 
user  must  have  access  to  the  Algol68  module  containing  the  Persist  software. 
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DISCPTR  (iisq>tr, 

VECTORQCHAR  id  =  "MODE  MYMODE  =  STRUCTCINT  i.  REF  REAL  x)," 

"EXAMPLE  =  STRUCT(MYMODE  m.  INT  n,  REFIJINT  data);"; 

OP(REF  EXAMPLE)INT  RTI  =  BIOP  99; 

EXAMK-E  example  := . ; 

disqKT  :=  persist  kecpsake_discfile,  id,  RTI(example)); 

( ”disq)tr"  can  be  filed  away  using  standard  KeepSake 
routines  to  file  DISCPTRs  or  it  can  be 
incorpwated  into  an  Algol  structure  and 
persisted  later) 

Tlte  procedure  Unpersist  takes  as  its  parameters  a  KeepSake  DISCFILE  and 
DISCPTR.  It  delivers  as  its  result  an  INTEGER  which  is  the  value  of  the  mainstore 
pointer  to  the  data  type  which  was  persisted.  The  user  musi  convert  this  INTEGER 
to  a  REF  MODE  (or  in  the  case  of  vectors  and  arrays  a  REF  REF  MODE)  with  a 
BIOP  99. 

The  data  structure  EXAMPLE  persisted  above  can  be  unpersisted  by  adding  the 
following  code  to  a  program.  As  with  Persist  the  call  of  Unpersist  must  be  within  a 
KeepSake  context  and  the  KeepSake  DISCFILE  must  have  been  opened  and 
initialised  using  standard  KeepSake  routines.  The  KeepSake  DISCPTR  must  have 
been  delivered  as  the  result  of  a  call  of  Persist. 


EXAMPLE  recovered_data; 

OP  (INT)  REF  EXAMPLE  ITR  =  BlOP  99; 

recovered_data  :=  rTR(unpersist(keq)sake_discfilc,  diseptr)) 

(recovered.data  can  then  be  used  normally  ) 


3.0  DESCRIPTION  OF  THE  METHOD  USED  BY  PERSIST  AND  UNPERSIST 

3.1  Persisting  data 

Briefly,  Persist  separates  data  items  into  mainstore  addresses  and  literals,  it  repacks 
the  data  into  a  contiguous  block  and  creates  a  separate  table  to  show  which 
elements  of  this  data  block  are  addresses.  KeepSake  disepointers  are  copied  to  a 
separate  vector  of  disepointers  and  their  locations  in  the  data  block  are  remembered 
in  a  second  table.  The  data  block,  tables  and  disepointers  are  then  filed  away  using 
basic  KeepSake  routines.  It  is  necessary  to  teU  KeepSake  specifically  of  any 
disepointers  included  in  the  user’s  data  to  prevent  their  being  lost  during  a  KeepSake 


garbage  collection.  A  more  detailed  description  of  the  Persist  software  is  given 
below. 


Persist  uses  a  lexical  reader  and  syntax  analyser  to  convert  the  vectoi  of  characters 
which  describe  the  data  to  be  persisted  into  a  format  which  it  can  use.  Each  data 
type  is  described  simply  as  a  number  of  bytes  with  pointers  to  those  bytes  which  are 
KeepSake  disepointers  and  mainstore  addresses.  TTie  syntax  analyser  has  to  take 
account  of  the  way  the  Algol68  compiler  on  VAXA^MS  represents  its  data  in 
mainstore. 

The  procedure  which  copies  the  data  from  mainstore  to  an  output  buffer  is  called 
recursively.  At  any  call  one  data  type  is  relocated  as  follows.  Firstly  all  the  bytes 
describing  a  data  type,  including  references  and  KeepSake  disepointers,  are  copied 
from  mainstore  and  appended  to  an  output  buffer.  At  this  stage  any  bytes  in  the  data 
type  which  represent  references  will  be  incorrect  and  still  contain  the  mainstore 
addresses.  If  the  data  type  contains  any  KeepSake  disepointers,  copies  of  them  are 
appended  to  a  vector  of  disepointers  and  the  position  of  bytes  which  represem 
disepointers  in  the  output  buffer  is  recorded  in  a  separate  vector.  If  the  data  type 
contains  references,  the  procedure  is  called  again  to  copy  the  data  types  pointed  to 
by  the  references,  and  the  bytes  in  the  output  buffer  which  represent  the  references 
are  reset  to  be  the  element  number  of  the  output  buffer  where  the  first  byte  of  the 
data  they  point  to  is  written.  As  with  disepointers  the  position  of  the  reference  bytes 
in  the  output  buffer  is  recorded  in  a  separate  vector.  This  recursive  method  of 
relocating  data  deals  with  any  data  type  including  multiple  linked  lists.  The 
following  example  illustrates  how  data  is  relocated. 

Assume  the  data  is  to  be  filed  as  two  vectors  of  integers.  The  first  vector,  VI  say,  is 
the  output  buffer  containing  all  the  users  data,  literals  and  addresses.  The  literals 
will  be  correct  but  the  addresses  will  be  with  respect  to  the  beginning  of  the  vector 
VI.  The  second  vector,  V2  say,  is  the  reference  table  and  will  have  one  element  for 
each  reference  in  the  users  data,  recording  which  elements  of  VI  are  references. 

Consider  the  following  example 


INT  a:=l,b:=2,  c:=3; 

REAL  ±=2.5; 

MODE  EXAMPLE  =  STRUCT(INT  i.  REF  INT  j,  INT  k,  REF  REAL  1); 
EXAMM.Es  =  (a,b.c.<l); 

The  data  type  s  would  be  written  away  as  two  vectors  as  follows 

VI  =  (1,5, 3,6,2,16672,0) 

V2  =  (2,4) 


V2[l]=2  tells  us  that  VI  [2]  is  a  REF.  Vl[2]=5  tells  us  that  the  data  pointed  to 
by  the  REF  starts  at  VI  [5].  As  the  REF  was  to  an  INTEGER  the  data  occupies  only 
one  element  of  VI.  However,  consider  V2[2]=4,  this  tells  us  that  VI [4]  is  a  REF; 
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Vl[4]=6  tells  us  that  the  data  pointed  to  by  the  REF  starts  at  VI [6].  In  this  case  the 
REF  is  to  a  REAL  and  the  data  occupies  two  elements  of  VI.  (16672,  0)  is  the 
representation  on  Vax  of  2.5  stored  in  twc  words. 

Before  a  data  type  is  relocated  its  reference  is  checked  to  see  if  it  is  a  reference  to 
NIL.  This  is  a  special  case;  no  data  is  relocated  and  the  bytes  representing  the 
reference  are  reset  lo  the  literal  value  NIL  which  for  the  Algol68  compiler  on 
VaXA^MS  is  0.  In  addition  to  checking  for  NIL  die  reference  checked  to  see  if  it 
is  a  reference  to  data  which  has  already  been  relocated.  This  is  achieved  by 
comparing  the  reference  and  data  length  of  the  data  type  to  be  relocated  with  a  table 
of  these  values  for  data  already  processed.  There  are  three  cases.  In  the  simple  case 
the  data  type  is  a  complete  copy  of  data  already  relocated;  no  further  relocation  is 
necessary  and  the  bytes  representing  the  reference  are  reset  to  be  the  element 
number  of  the  output  buffer  where  the  first  byte  of  the  data  they  point  to  is  already 
written.  In  the  second  case  the  data  type  to  be  relocated  contains  a  data  typ>e  which 
has  been  processed  already,  for  example,  a  user  persists  an  element  of  a  vector  and 
then  the  complete  vector.  In  this  case,  after  the  data  type  has  been  relocated  and  its 
reference  bytes  reset,  the  original  copy  of  the  included  data  type  is  removed  from  the 
output  buffer  and  its  reference  bytes  are  updated  to  its  new  piosition.  The  final  case 
is  when  the  data  type  to  be  relocated  has  some  data  in  common  with  a  previously 
processed  data  type,  for  example,  the  user  persists  two  overlapping  slices  of  a  large 
vector.  In  this  case  the  roftware  relocates  the  union  of  the  two  data  types  and  resets 
their  reference  bytes  accordingly.  Duplicated  data  is  removed  from  the  output  buffer 
and  the  buffer  is  compressed.  Data  persisted  in  this  way  is  consistent,  in  that  should 
a  data  type  contain  addresses  pointing  to  a  common  area  of  store,  for  example,  a 
structure  containing  an  array  and  a  slice  of  the  same  array,  only  one  copy  of  the  array 
will  be  persisted  and  the  slice  will  be  a  pointer  into  that  array. 

In  practice  data  cannot  be  copied  directly  to  an  output  buffer  as,  with  large  data 
types,  this  would  cause  storage  problems.  However,  it  is  possible  to  record  sufficient 
data  to  enable  the  buffers  to  be  created,  or  partially  created,  at  the  time  of  data 
transfer. 

3.2  Unpersisiin3  data 

Unpcrsist  uses  KeepSake  routines  to  recover  the  filed  data  block,  disepointers  and 
tables.  The  disepointers  in  the  data  block  are  overwritten  by  the  recovered 
disepointers  using  the  locational  infonnation  stored  in  the  second  table.  The 
addresses  in  the  data  block  are  reset  using  the  start  address  of  the  data  block  in 
mainstore  and  informaticn  from  the  first  table.  The  mainstore  start  address  is  then 
delivered  as  the  result.  As  an  illustration  of  the  method  used  by  Unpersist,  consider 
the  data  type  EXAMPLE  persisted  above.  The  data  was  stored  as  two  vectors  as 
follows:- 


Recovery  of  this  data  is  simple.  Both  vectors,  VI  the  output  buffer  and  V2  the 
reference  table,  are  copied  into  mainstore  and  the  mainstore  address  of  the  first 
clement  of  VI  is  established;  say  this  is  called  rinal_address.  Elements  VI  [2]  and 
VI  [4],  which  are  known  to  be  references  from  vector  V2,  are  incremented  by  the 
valu^  of  final_address.  All  that  now  remains  is  to  assign  the  value  final_address  to  a 
variable  of  mode  REF  EXAMPLE.  The  above  data  type  did  not  contain  any 
discpointers.  Had  it  done  so,  they  would  have  been  recovered,  along  with  the  table 
describing  their  location  in  the  output  buffer.  The  bytes  representing  discpointers  in 
the  output  buffer  would  then  have  been  overwritten  by  the  recovered  discpointers.  It 
is  necessary  to  reset  the  discpointers  in  the  output  buffer  because  they  may  have 
been  given  new  values  during  a  compacting  garbage  collection. 

Storing  the  tables  describing  the  data  type  on  disc  means  that  the  mode  of  the  object 
which  has  been  persisted  does  not  have  to  be  given  to  Unpiersist;  an  alternative 
solution  would  be  to  pass  a  vector  of  characters  describing  the  persisted  mode  to 
Unpersist  and  to  parse  it  to  produce  the  tables. 

4.0  TYPE  CHECKING 

Given  access  to  the  compiler,  the  "ideal"  "safe"  solution  would  be  to  file  the 
compiler’s  description  of  the  data  type  to  be  persisted  with  the  persisted  data.  On 
recovery,  this  description  could  be  compared  with  the  compiler’s  description  of  the 
data  type  to  which  the  recovered  data  will  be  assigned.  This  would  ensure  that  data 
is  never  recovered  into  an  incorrect  data  type.  As  Persist  does  not  interact  with  the 
compiler,  this  is  not  possible  and  there  is  no  check  that  the  user  has  supplied  the 
correct  mode  to  Unpersist  although,  should  data  be  recovered  to  an  incorrect  mode, 
it  would  almost  certainly  result  in  an  access  violation.  The  main  concern  is  to 
ensure  that  data  on  disc  is  not  corrupted  by  a  user  inadvertently  recovering  "rubbish" 
into  a  disepointer.  Here  we  rely  on  KeepSake’s  routine  type  checking  which  will  not 
accept  discpointers  which  could  not  have  been  produced  by  KeepSake  for  the 
database  in  use.  This  means  that  an  incorrect  disepointer  is  almost  certain  to  fail 
when  it  is  submitted  to  a  KeepSake  read  or  write  procedure. 
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KeepSake  [1]  is  a  multiuser  data  base  kernel  which  extends  a  programming 
language  to  enable  simple  data  structures  to  be  stored  on  disc.  This  paper  describes 
a  method  of  extending  the  KeepSake  procedures  to  facilitate  the  production  of  a 
persistent  heap,  in  which  all  the  data  structures  in  the  programming  language  can  be 
stored  on  disc. 
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